Goto

Collaborating Authors

 structural risk minimization


Structural Risk Minimization for Character Recognition

Neural Information Processing Systems

The method of Structural Risk Minimization refers to tuning the capacity of the classifier to the available amount of training data. This capac(cid:173) ity is influenced by several factors, including: (1) properties of the input space, (2) nature and structure of the classifier, and (3) learning algorithm. Actions based on these three factors are combined here to control the ca(cid:173) pacity of linear classifiers and improve generalization on the problem of handwritten digit recognition. A common way of training a given classifier is to adjust the parameters w in the classification function F( x, w) to minimize the training error Etrain, i.e. the fre(cid:173) quency of errors on a set of p training examples. Etrain estimates the expected risk based on the empirical data provided by the p available examples.


Structural Risk Minimization for Nonparametric Time Series Prediction

Neural Information Processing Systems

The problem of time series prediction is studied within the uniform con(cid:173) vergence framework of Vapnik and Chervonenkis. The dependence in(cid:173) herent in the temporal structure is incorporated into the analysis, thereby generalizing the available theory for memoryless processes. Finite sam(cid:173) ple bounds are calculated in terms of covering numbers of the approxi(cid:173) mating class, and the tradeoff between approximation and estimation is discussed. A complexity regularization approach is outlined, based on Vapnik's method of Structural Risk Minimization, and shown to be ap(cid:173) plicable in the context of mixing stochastic processes. A great deal of effort has been expended in recent years on the problem of deriving robust distribution-free error bounds for learning, mainly in the context of memory less processes (e.g. On the other hand, an extensive amount of work has been devoted by statisticians and econometricians to the study of parametric (often linear) models of time series, where the dependence inherent in the sample, precludes straightforward application of many of the standard results form the theory of memoryless processes.


Dyadic Classification Trees via Structural Risk Minimization

Neural Information Processing Systems

Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of clas- sification problems. This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal perfor- mance is attained with linear (in the number of training data) complexity growing and pruning algorithms.


Structural risk minimization for quantum linear classifiers

Gyurik, Casper, van Vreumingen, Dyon, Dunjko, Vedran

arXiv.org Artificial Intelligence

Quantum machine learning (QML) models based on parameterized quantum circuits are often highlighted as candidates for quantum computing's near-term ``killer application''. However, the understanding of the empirical and generalization performance of these models is still in its infancy. In this paper we study how to balance between training accuracy and generalization performance (also called structural risk minimization) for two prominent QML models introduced by Havl\'{i}\v{c}ek et al. (Nature, 2019), and Schuld and Killoran (PRL, 2019). Firstly, using relationships to well understood classical models, we prove that two model parameters -- i.e., the dimension of the sum of the images and the Frobenius norm of the observables used by the model -- closely control the models' complexity and therefore its generalization performance. Secondly, using ideas inspired by process tomography, we prove that these model parameters also closely control the models' ability to capture correlations in sets of training examples. In summary, our results give rise to new options for structural risk minimization for QML models.


Encoding-dependent generalization bounds for parametrized quantum circuits

Caro, Matthias C., Gil-Fuster, Elies, Meyer, Johannes Jakob, Eisert, Jens, Sweke, Ryan

arXiv.org Machine Learning

A large body of recent work has begun to explore the potential of parametrized quantum circuits (PQCs) as machine learning models, within the framework of hybrid quantum-classical optimization. In particular, theoretical guarantees on the out-of-sample performance of such models, in terms of generalization bounds, have emerged. However, none of these generalization bounds depend explicitly on how the classical input data is encoded into the PQC. We derive generalization bounds for PQC-based models that depend explicitly on the strategy used for data-encoding. These imply bounds on the performance of trained PQC-based models on unseen data. Moreover, our results facilitate the selection of optimal data-encoding strategies via structural risk minimization, a mathematically rigorous framework for model selection. We obtain our generalization bounds by bounding the complexity of PQC-based models as measured by the Rademacher complexity and the metric entropy, two complexity measures from statistical learning theory. To achieve this, we rely on a representation of PQC-based models via trigonometric functions. Our generalization bounds emphasize the importance of well-considered data-encoding strategies for PQC-based models.


Learning Theory and Support Vector Machines - a primer

Banf, Michael

arXiv.org Machine Learning

The main goal of statistical learning theory is to provide a fundamental framework for the problem of decision making and model construction based on sets of data. Here, we present a brief introduction to the fundamentals of statistical learning theory, in particular the difference between empirical and structural risk minimization, including one of its most prominent implementations, i.e. the Support Vector Machine.


Structural Return Maximization for Reinforcement Learning

Joseph, Joshua, Velez, Javier, Roy, Nicholas

arXiv.org Machine Learning

Reinforcement Learning (RL) (Sutton & Barto, 1998) is a framework for sequential decision making under uncertainty with the objective of finding a policy that maximizes the sum of rewards, or return, of an agent. A straightforward model-based approach to batch RL, where the algorithm learns a policy from a fixed set of data, is to fit a dynamics model by minimizing a form of prediction error (e.g., minimum squared error) and then compute the optimal policy with respect to the learned model (Bertsekas, 2000). As discussed in Baxter & Bartlett (2001) and Joseph et al. (2013), learning a model for RL by minimizing prediction error can result in a policy that performs arbitrarily poorly for unfavorably chosen model classes. To overcome this limitation, a second approach is to not use a model and directly learn the policy from a policy class that explicitly maximizes an estimate of return (Meuleau et al., 2000). With limited data, approaches that explicitly maximize estimated return are vulnerable to learning policies which perform poorly since the return cannot be confidently estimated.


Dyadic Classification Trees via Structural Risk Minimization

Scott, Clayton, Nowak, Robert

Neural Information Processing Systems

Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems. This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance is attained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical results on structural risk minimization, on which the pruning rule for DCTs is based.


Dyadic Classification Trees via Structural Risk Minimization

Scott, Clayton, Nowak, Robert

Neural Information Processing Systems

Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems. This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance is attained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical results on structural risk minimization, on which the pruning rule for DCTs is based.


Dyadic Classification Trees via Structural Risk Minimization

Scott, Clayton, Nowak, Robert

Neural Information Processing Systems

Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems.This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance isattained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical resultson structural risk minimization, on which the pruning rule for DCTs is based.